Goto

Collaborating Authors

 human advice


Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance

arXiv.org Artificial Intelligence

--Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a trade-off between the certainty and timeliness of human advice, our method takes a step towards RL-driven human-in-the-loop engineering methods. Modeling activities are often more complex than an atomic model transformation (MT) and rely on sequences of MTs . Pertinent examples can be found in model synchronization [1], model refactoring [2], and rule-based design-space exploration [3]. Typically, there might be more than one MT sequence that can successfully transform the source model into the target state, and choosing the most appropriate (cost-effective, efficient, safe) one manually is not tractable. This raises the need for automated methods for developing complex MTs, in which MTs are chained in sequences.


Does AI and Human Advice Mitigate Punishment for Selfish Behavior? An Experiment on AI ethics From a Psychological Perspective

arXiv.org Artificial Intelligence

People increasingly rely on AI-advice when making decisions. At times, such advice can promote selfish behavior. When individuals abide by selfishness-promoting AI advice, how are they perceived and punished? To study this question, we build on theories from social psychology and combine machine-behavior and behavioral economic approaches. In a pre-registered, financially-incentivized experiment, evaluators could punish real decision-makers who (i) received AI, human, or no advice. The advice (ii) encouraged selfish or prosocial behavior, and decision-makers (iii) behaved selfishly or, in a control condition, behaved prosocially. Evaluators further assigned responsibility to decision-makers and their advisors. Results revealed that (i) prosocial behavior was punished very little, whereas selfish behavior was punished much more. Focusing on selfish behavior, (ii) compared to receiving no advice, selfish behavior was penalized more harshly after prosocial advice and more leniently after selfish advice. Lastly, (iii) whereas selfish decision-makers were seen as more responsible when they followed AI compared to human advice, punishment between the two advice sources did not vary. Overall, behavior and advice content shape punishment, whereas the advice source does not.


Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop

arXiv.org Artificial Intelligence

Human-in-the-loop (HiL) reinforcement learning is gaining traction in domains with large action and state spaces, and sparse rewards by allowing the agent to take advice from HiL. Beyond advice accommodation, a sequential decision-making agent must be able to express the extent to which it was able to utilize the human advice. Subsequently, the agent should provide a means for the HiL to inspect parts of advice that it had to reject in favor of the overall environment objective. We introduce the problem of Advice-Conformance Verification which requires reinforcement learning (RL) agents to provide assurances to the human in the loop regarding how much of their advice is being conformed to. We then propose a Tree-based lingua-franca to support this communication, called a Preference Tree. We study two cases of good and bad advice scenarios in MuJoCo's Humanoid environment. Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice. Finally, we present a human-user study with 20 participants that validates our method.


Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions

arXiv.org Artificial Intelligence

In many applications of AI, the algorithm's output is framed as a suggestion to a human user. The user may ignore the advice or take it into consideration to modify his/her decisions. With the increasing prevalence of such human-AI interactions, it is important to understand how users act (or do not act) upon AI advice, and how users regard advice differently if they believe the advice come from an "AI" versus another human. In this paper, we characterize how humans use AI suggestions relative to equivalent suggestions from a group of peer humans across several experimental settings. We find that participants' beliefs about the human versus AI performance on a given task affects whether or not they heed the advice. When participants decide to use the advice, they do so similarly for human and AI suggestions. These results provide insights into factors that affect human-AI interactions.


Influencing Reinforcement Learning through Natural Language Guidance

arXiv.org Artificial Intelligence

Interactive reinforcement learning agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that is either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known Interactive reinforcement learning technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep reinforcement learning agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven reinforcement learning agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the reinforcement learning agent to better policy shaping.


Decision-makers Processing of AI Algorithmic Advice: Automation Bias versus Selective Adherence

arXiv.org Artificial Intelligence

Artificial intelligence algorithms are increasingly adopted as decisional aides by public organisations, with the promise of overcoming biases of human decision-makers. At the same time, the use of algorithms may introduce new biases in the human-algorithm interaction. A key concern emerging from psychology studies regards human overreliance on algorithmic advice even in the face of warning signals and contradictory information from other sources (automation bias). A second concern regards decision-makers inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes (selective adherence). To date, we lack rigorous empirical evidence about the prevalence of these biases in a public sector context. We assess these via two pre-registered experimental studies (N=1,509), simulating the use of algorithmic advice in decisions pertaining to the employment of school teachers in the Netherlands. In study 1, we test automation bias by exploring participants adherence to a prediction of teachers performance, which contradicts additional evidence, while comparing between two types of predictions: algorithmic v. human-expert. We do not find evidence for automation bias. In study 2, we replicate these findings, and we also test selective adherence by manipulating the teachers ethnic background. We find a propensity for adherence when the advice predicts low performance for a teacher of a negatively stereotyped ethnic minority, with no significant differences between algorithmic and human advice. Overall, our findings of selective, biased adherence belie the promise of neutrality that has propelled algorithm use in the public sector.


Researchers challenge AI to give advice as well as humans on Reddit can

#artificialintelligence

Researchers in Seattle have introduced what they call a new AI grand challenge called TuringAdvice, which is centered on creating language models that generate helpful advice for humans using real-world language. The TuringAdvice challenge is based on the dynamic RedditAdvice data set. Created for the challenge, RedditAdvice is a crowdsourced data set of advice shared in the past two weeks that got the most upvotes in Reddit subcommunities. To pass the challenge, a machine must deliver advice as helpful as or better than popular human advice. As part of the TuringAdvice launch, the researchers also released a static RedditAdvice 2019 data set for training advice-giving AI models, which includes 616,000 pieces of advice from 188,000 situations shared by people in Reddit subcommunities.


We asked 8 top wealth management execs to predict the future of human advice, roboadvisers, and fees. Here are their full responses to our survey.

#artificialintelligence

By 2030, consumers will be using a platform that not only intelligently manages their investments, but also incorporates their ongoing cash management into their advice and automation. You won't have to think about how much money you should keep in your checking or savings accounts. You shouldn't have to think about which account to pay your bills out of. We'll automate your entire financial life so you can spend more of your time doing the things that actually make you happy. As a result, we would expect to see things like budgeting tools and programs phased out.


How to Optimize Search Engine Strategies for Machine Learning Trend

#artificialintelligence

Machine learning is a way of analyzing data automatically using a type of analytical method! This AI will learn about information patterns and also make decisions with hardly any human advice. Machine learning is a method of analyzing data mechanically using a sort of analytical strategy! It's an artificial intelligence that may learn information patterns and also make decisions with small human advice. Artificial intelligence is the way of earning computers perform tasks which require intelligence.


Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

arXiv.org Machine Learning

Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot immediately prevent it from wrong-doing. In this paper, we extend Policy Gradient to make it robust to external directives, that would otherwise break the fundamentally on-policy nature of Policy Gradient. Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster. Our experiments demonstrate that DPG makes the agent learn much faster than reward-based approaches, while requiring an order of magnitude less advice.